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Abstract — A simple outer bound for the multiterminal source 
coding problem is given in terms of Hirschfeld-Gebelein-Renyi 
maximal correlation. Compared to other standard outer bounds 
which are parameterized by auxiliary random variables, the 
proposed outer bound has the advantage of (i) being efficiently 
computable, and (ii) having an explicit expression. 

I. Introduction 

We begin with a discussion of the two-encoder quadratic 
Gaussian source coding proble in order to motivate our 
main result. To this end, suppose X, Y are jointly Gaussian 
- each with unit variance and correlation p - and distor- 
tion is measured under mean square error. In this setting, 
the set of achievable rate distortion tuples is given by all 
(R x ,Ry,Dx 7 D y ) satisfying 



Rx > ^ log 



(1 - p 2 + p 2 2~ 



Ry>^ log ( — (l - p 2 



Rx + Ry>2 1o S 



1 

Dx~ 
1 

(l-p 2 )P(D x ,D Y ) 



2Ry 



p 2 2- 2R > 



2D X D Y 



where 



P{D X ,D Y ) = 1 




Ap 2 D x D Y 

a - P 2 ) 2 ' 



(1) 

(2) 
(3) 

(4) 



Long before the converse result was completed in (TJ, it 
was known that any (R X ,R Y ,D X ,D Y ) satisfying (Q])-([3} 
was achievable. Indeed, (R x , R Y , D x . D Y ) satisfying (Q])-([3} 
correspond to a set of points in the Berger-Tung achievable 
region attained by Gausian test channels J2), (3J. Moreover, 
roughly a decade before the sum-rate lower bound (|3) was 
established in [1|, it was proven by Oohama [4| that (fl}- 
d2J were necessary conditions for (R X ,R Y ,D X ,D Y ) to be 
achievable. Thus, in the period between the publication of 
PI and H), ad-hoc lower bounds on the sum-rate could be 
established as follows. 
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under grant agreement CCF-0939370. 

'We assume the reader has some familiarity with the multiterminal source 
coding problem. For those who are unfamiliar, a formal definition of the 
problem is given in Section [H] 
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Fig. 1. Comparison of Eqns. f5}, (9j> ar| d HOI for P = 1/5. 



Noting that the right hand sides of (|TJ and (O are convex 
in R x and R Y , respectively, it is straightforward to establish 
the necessity of 



Rx + p 2 R Y > i log ' 



R Y + p 2 R x > I log 



D x 
1 

Dy~ 



(5) 
(6) 



in order for (R x , Ry, D x , D Y ) to be achievable. Indeed, this 
can be seen by linearizing the RHS of (JTJ at Ry — 0: 



R x > 1 -log(^-(l-p 2 +p 2 2- 2 ^ 



(7) 



> 



iv»(i (1 -^ + A*)) 



Ry=0 



1 



= \ Xog (-k 



p 2 R Y . 



Ry=0 

(8) 



Thus, a simple sum-rate lower bound in the quadratic Gaussian 
setting is given by 

(9) 

In Figure we have compared the lower bound ((9) against 
the optimal sum-rate constraint 01 for p = 1/5. As evidenced 
by the figure, the reader will note that the simplified sum-rate 
lower bound (0 provides a strikingly tight approximation to 
(0. 

In Figure we consider more highly correlated sources 
with p = 4/5. As the reader will notice, the accuracy with 
which (0 approximates 01 worsens as DxDy becomes small. 
This is to be expected since (0 was obtained by considering 
hyperplanes which support the rate-distortion region when one 
rate is zero (i.e., in the low-resolution regime). This situation 
can be remedied in part by recalling known results for source 
coding in the high-resolution regime (cf. [5. Equation (2c)]): 

^ + i^>ilog(^0 (10) 

Taking the maximum of (0 and ( fTOb then yields a fairly 
accurate approximation of 01. 
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Fig. 2. Comparison of Eqns. (3}> l|9j, and 4 lOt for p = 4/5. 

Admittedly, our derivation of (0 was ad-hoc and required 
necessity of and (0, which was established by Oohama in 
|4| many years after the multiterminal source coding problem 
was posed. Thus, it is desirable to establish a generalization 
of (0 and (0 to arbitrary sources and distortion measures 
which does not require known converse results for the specific 
problem instance under consideration. This generalization is 
precisely what we prove in this paper. 

Section HI] delivers our main result. Two alternate proofs are 
given in Section [HI] along with a brief discussion. Section ITVl 
summarizes our conclusions. 



II. Definitions and Main Result 

Definition 1. Define p m (X, Y) to be the Hirschfeld-Gebelein- 
Renyi maximal correlation between random variables X and 
Y. Formally, 

Pm (X,Y) =supE/ (X)g(Y), (11) 

where the supremum is taken over all functions satisfying 
E/pO = Eg(Y) = and Ef 2 {X) = Eg 2 {Y) = 1. 

We remark that p m (X, Y) 6 [0, 1] as a consequence of the 
Cauchy-Schwarz inequality. 

Definition 2. For a random variable X with alphabet X, 
a reproduction alphabet X, and a distortion function d x : 
X x X — > [0, oo), let Wx(Dx) denote the corresponding rate 
distortion function. That is, 

Rx(D X )= min I(X:X). (12) 

p(x\x):E[dx(X,X)]<D x 

Definition 3. Assume {X i ,Y i }°^ 1 is a 2-DMS with joint 
distribution Px.y- A rate distortion tuple (Rx, Ry, Dx, Dy) 
is achievable for distortion functions d x , d y if, for any e > 0, 
there exists an integer n, encoding functions 

f x :X n ^{l,...,2 nR x} (13) 

fy-.y" ^{l,...,2 nR ^}, (14) 

and decoding functions 

<f> x :{!,..., 2 nR * } x {1, . . . , 2 nRv } -> X n (15) 
^:{l,...,2" flx }x{l,...,2"^}^y" (16) 

which satisfy 

E[d x {X n ,<f> x (f x (X n )J y (Y n )))]<D x +e (17) 
E [d y (Y n , cf> v (f x (X n ), f y (Y n )))} <D Y + e. (18) 

We remark that distortion between two sequences is defined 
as the average per-symbol distortion (as usual). 

Theorem 1. Suppose (Rx, Ry, Dx, Dy) is an achievable 
rate distortion tuple for distortion functions d x , d y . Then 

Rx+ p 2 m {X,Y)R Y > Rx(Dx) (19) 
Ry + p 2 m (X,Y)R x > R Y {D Y ). (20) 

An immediate corollary of Theorem Q~|is the sum-rate lower 
bound 

Rx + Ry> — * v . (Rx(Dx)+^y(Dy)). (21) 

We remark that if X, Y are jointly Gaussian with correlation 
coefficient p, we have that p 2 — p^XjY). Thus, Theorem 
and (fJTJ generalize the bounds 0, (0, and (0 to any choice 
of sources and distortion measures as desired. 



A. Discussion 

Roughly speaking, d2TT i implies that - as long as X, Y are 
not highly correlated - compressing with an optimal scheme 
(which exploits correlation to the fullest extent) provides little 
savings in attainable sum-rate over treating the sources as if 
they were independent. For example, quaternary sources with 
joint distribution given by 



Px,v(x,y) 



1/10 ifx = y 
1/20 ifx^y 



(22) 



have ^(J, Y) = 0.04. Hence, separate encoding of X and 
Y at rates M.x(Dx) an d ^y{Dy) incurs at most a 4% penalty 
in sum-rate over an optimal scheme (regardless of which 
distortion measures are considered). 

A clear benefit Theorem Q] holds over other outer bounds is 
that it yields an efficiently computable outer bound. Indeed, 
p 2 n (X,Y) is equal to the maximum eigenvalue of a linear 
transformation defined by the joint distribution of X and Y, 
and is therefore computable with polynomial complexity in 
\X\ x \y\. We refer the reader to Renyi's original paper []6] 
for complete details on computing p 2 n (X,Y). 

Theorem Q] also has significant intuitive appeal since it 
explicitly relates the multiterminal source coding problem 
to the individual rate distortion functions coupled via the 
maximal correlation. This tradeoff between correlation and 
achievable rate-distortion tuples is obscured in the well-known 
Berger-Tung outer bound due to its use of auxiliary random 
variables which often have no physical interpretation (due to 
the Marokov conditions they satisfy). 

Although the examples we have discussed may give the 
impression that (1211 is nearly tight, we point out that this is 
not always the case. Indeed, one can devise examples such 
as X ~ Bernoulli (1/2), Y — X with probability 1, d y = 
0, and d x equal to Hamming distortion. In this case (f2Tb is 
suboptimal by a factor of 2, however ( fT9b is tight in this case. 
Setting aside contrived examples, we believe that Theorem Q] 
will give useful bounds for many practical settings of interest 
(e.g., sensor networks, binaural recording, etc.). 

III. Two Proofs of TheoremQ] 

In lieu of proving Theorem Q] we shall prove the stronger 
resulo 

Theorem 2. Suppose (Rx , Ry > D x , Dy) is an achievable 
rate distortion tuple for distortion functions d x , d y . Then 

Rx+p 2 m (X,Y)R Y >I(X;X,Y) (23) 
R Y + p 2 n {X, Y)R X > I(Y: X, Y) (24) 
for some conditional distribution P x Y \x Y satisfying 



Ed x (X,X)<D x 
Ed y (Y,Y) < D Y - 



(25) 
(26) 



Clearly, Theorem Q] follows immediately from Theorem |2] 
and Definition [2] We note that it suffices to consider finite 
alphabets X and y in proving Theorem Q] Indeed, extending 
the result to continuous random variables follows by the 
usual quantization arguments. Therefore, we shall assume that 
max{|,Y|, \y\} < oo for the remainder of this section. 

We give two different proofs of Theorem Both arguments 
rely on the following key lemma. 

Lemma 1 (See Q Equation (3.19)]). // U o Y o X form 
a Markov chain in that order, then 

I(X;U)<p 2 m (X 1 Y)I(Y;U). (27) 

A. A Direct Proof of Theorem [2] 

First Proof of Theorem [2} Fix e > 0. Since 
(Rx,Ry,Dx,Dy) is achievable, there exists a 
(2*,2*,n) code (f x , f y , <f> x , 4> y ) which satisfies 
( TTTb and (fT8T >. In order to simplify notation, we write X n = 
MMX n )Jy(Y n ))) and Y n = 4> y {f x {X% / V (Y"))). 
With this notation, observe that 

nR x > H(f x (X n )) (28) 

>I(X"-f x (X n )\f v (Y n j) (29) 

= I(X n ; f x (X n )J y (Y n )) - I(X n ; f y {Y n )) (30) 
> I(X n -J x (X n )J y (Y n )) 



p 2 m (X n ,Y n )I(Y n ;f y (Y n )) 



(31) 
(32) 



> I{X n ; X n , Y n ) - p 2 m (X n , Y n )nR Y 

n 

= xn > Y"^- 1 ) - p 2 m (X n , Y n )nRy (33) 

i=l 
n 

>£/(X i; p 2 m {X n ,Y n )nR Y (34) 

i=l 
n 

= J2 I (Xf,X i ,Y i )-p 2 m (X,Y)nR Y . (35) 
i=i 

In the above string of inequalities, 

• OTb is a consequence of Lemma Q] since f y (Y n ) o 
Y n <-» X n . 

• ( |32l follows from the data processing inequality and the 
fact that I(Y n ; f v {Y n )) < nR Y . 

• ( [34l follows by the memoryless property of the source 
and monotonicity of mutual information. 

• d35l l follows by the tensorization property of maximal 
correlation for memoryless sources. That is, p 2 n (X, Y) = 
p 2 m (X n ,Y n ) (See QD Theorem 1]). 

A similar argument proves the symmetric inequality 



I " 

R Y + p 2 m {X, Y)nR x > - V I(Y; X h Y, t ). 

II ' 



(36) 



i=l 



Define 



2 Like Theorem \T\ the outer bound given by Theorem |2] is efficiently 
computable. 



1 

P(x, y\x, y) = Pr { Xt =%i Y i = V\ x i =x,Yi = yj 



By linearity of expectation, we have 

Ed x (X,X)=E[d x (X n > <f> x (f x (X n )J v (Y n )))} <D x +e 
Ed v (Y,Y) =E[dy(Y n ,(f> y (f x (X n ),f y (Y n )))] <D Y + e. 

Also, since (Xi,Yi) are identically distributed for all i, con- 
vexity of mutual information in the conditional distribution 
implies the desired inequalities 



1 " 

~J2l(XnXi,Yl)>I(X;X,Y) 
n * — ' 

i— 1 
1 n 

- Y^IiY-X^Yi) >I(Y;X,Y), 



(37) 
(38) 



completing the proof. 



B. A Proof of Theorem [2] via Logarithmic Loss 

Interestingly, Theorem [2] can also be derived from the recent 
results on source coding under logarithmic loss [9|. This 
suggests that logarithmic loss may be useful in obtaining other 
converse results, which are stronger than Theorem 

Let Ai(X) denote the set of probability measures on X. 
For i( LL ) G M(X), the logarithmic loss function (III '■ X x 
M(X) -4 M is defined by 



d LL (x,x^) = log- 



(39) 



where x^ LL \x) is the probability x^ LL ^ assigns to the outcome 
x £ X. When d x and d y are both logarithmic loss distortion 
measures (defined for their respective source alphabets X and 
y), the rate distortion region is known. The characterization of 
this region is given by the following theorem, which is proved 
in |ED. 

Theorem 3. (Rx,Ry,Dx,Dy) is achievable under loga- 
rithmic loss if and only if 

Rx>I{X;U x \Uy,Q) (40) 

Ry >I(Y;U y \U x ,Q) (41) 

Rx +Ry> I(X, Y; U x , U y \Q) (42) 

D X >H(X\U X ,U Y ,Q) (43) 

D Y >H(Y\Ux,U y ,Q) (44) 

for some joint distribution of the form 

p(x,y)p(q)p(u x \x,q)p(u Y \y,q) with \U X \ < \X\, 
\U Y \ < \y\, and \Q\ < 5. 

Second Proof of Theorem^ Since (Rx , Ry, Dx, Dy) 
is achievable, there exists a (2 nRx , 2 nIiY , n) code 
(fx, fy,<t>x,<t>y) which satisfies dTvT > and (fT8l . By considering 
the logarithmic loss reproductions 



X 



Pi[Xi = x\f x (X n )J y (Y n )] 
=Pv[Y i = y\f x (X n ),fy(Y n )} 



(45) 



for each index i = 1,2, ... ,n, Theorem [3] guarantees the ex- 
istence of a joint distribution p(x, y)p(q)p(ux\x, q)p(uy \y, <?) 
with \U X \ < \X\, \U Y \ < \y\, and \Q\ < 5 which satisfies^ 

Rx >I(X;U X \U Y ,Q) (47) 

R Y >I(Y;U y \U x ,Q) (48) 

R x +Ry = I(X,Y;U x ,U y \Q) (49) 

1 " 

-Y J H(X l \f x (X n ),f y (Y n ))>H(X\U x ,U Y ,Q) (50) 

i=l 
n 

- y^H(Y l \f x (X n ),f y (Y n )) >H(Y\U X ,U Y ,Q). (51) 

i=l 

We now make several observations, from which the claim 
follows easily. 

First, note that ( |50l is equivalent to 

1 " 

I(X;U X ,U Y \Q) > - V/(X 4 ;/,(X"),/ y (F n )). (52) 
n ' 

»=l 

Second, since R x + R Y = I(X,Y;U X ,U Y \Q) and R x > 
I(X;U X \U Y ,Q), we have 

R Y = I(X,Y;Ux,U y \Q)-Rx (53) 

= I(Y; U Y \Q) - (Rx - I(X; U X \U Y ,Q)) (54) 

<I(Y;U Y \Q). (55) 
Third, we observe that 

Rx+Ry = I(X,Y;Ux,U y \Q) (56) 
= I(X; U x , U Y \Q) + I(Y; U X ,U Y \X, Q) (57) 
= I(X;U X ,U Y \Q) + I(Y;U Y \Q) 

-I(X;U Y \Q) (58) 

> I(X;U X ,U Y \Q) + I(Y;U Y \Q) 

- p 2 m (X,Y)I(Y;U Y \Q) (59) 
= I(X;U X ,U Y \Q) 

+ (l-p 2 m (X,Y))I(Y;U Y \Q) (60) 

> I(X; U x , U Y \Q) + (1 - p 2 m (X, Y))R Y , (61) 

where d59l follows from Lemma Q] and (|6H follows from (l55i 
and the fact that p 2 m (X, Y) < 1. 

We rearrange (|6H and apply (l52i to obtain the desired 
inequality: 



R x + p 2 m (X, Y)R Y > I(X; U x , U Y \Q) 



(62) 



1 n 

> - Y J I(X l ;f x (X n ),f y (Y n )) (63) 

n 

= Ti(i i; r,r) (64) 

2 — 1 
1 " 

^-YUXi-X^Yi). (65) 



A standard convexity argument (identical to the final step of 
the alternative proof) completes the argument. ■ 



(46) ^Establishing the equality in the sum-rate constraint is straightforward. 



C. Remarks 

Typical applications of Lemma [T] begin with a single- 
letter characterization of the problem of interest. However, 
such characterizations are unknown for most multiterminal 
problems. Indeed, characterizing the rate-distortion region for 
the multiterminal source coding problem defined in Section [XT] 
for general distortion measures d x , d y is a longstanding open 
problem. In general, the use of maximal correlation (together 
with Lemma[T|i can be used to obtain meaningful outer bounds 
in source coding problems without first appealing to a single- 
letter characterization. 

For instance, a simple sum-rate bound for the CEO problem 
(cf. iflOl for a definition) can be given as follows. Suppose the 
observations (Yi, . . . , Y&) are conditionally independent given 
X, which should be reproduced at the decoder subject to a 
constraint on distortion measured under d x . If . . . , Rk, D) 
is an achievable rate-distortion vector for this CEO problem, 
then 



Y,f>HX,Yi)Ri>*x{Px) 



i=l 

Similar ideas can be applied to non-rate-distortion settings. 
As an example, consider the problem of generating common 
randomness: 

Definition 4. Assume {X i ,Y i \ c *L 1 is a 2-DMS with joint 
distribution Px,Y- A common randomness pair (C, R) is 
achievable if, for any e > 0, there exists an integer n, an 
encoding function 

f m : X n ^{l,...,2 nR }, (66) 

and decoding functions 

h:K= f\(X n ) (67) 
h:K' = f 2 (Y n J m (X n )) (68) 

which satisfy 

(69) 
(70) 

-H{K\K')<e. (71) 
n 

Let C(R) be the common randomness capacity: 

C{R) = sup{C : (C, R) is achievable.}. (72) 

In his Ph.D. thesis, Zhao proved the following theorem, which 
bounds the maximum number of bits of randomness that can 
be "unlocked" for each bit of communication allowed between 
users. 



Pr(K = K') > 1 - e 

-H{K) >C-e 
n 

1 



Theorem 4 ( OH Theorem 3]). 



C(R) 



< 



1 



(73) 



r -i- P l(x,Y)- 

Zhao's original proof of Theorem |4] while simple, begins 
with a single-letter characterization of the common random- 
ness capacity C(R), originally due to Ahlswede and Csiszar 



lfl2ll . By proceeding along the lines of the direct proof of 
Theorem |2] we can obtain an easy, alternate proof of Theorem 
|4] without appealing to a single-letter characterization of C(R). 

Proof of Theorem [?} Fix e > and consider a scheme 
which satisfies d69])-([7T), with C = C{R). Then, we have: 

nR+np 2 m (X,Y)C(R) (74) 

>nR + p 2 m {X,Y)H(K) (75) 

>I(f m (X n );X n ,K) + I(K;Y n ) (76) 

> I(f m (X n );X n ,K\Y n ) + I(K; Y n ) (77) 
= I(f m (X n ),Y n -X n ,K) (78) 
>I{K';K) (79) 

> n{C{R) - e), (80) 

where (l76l l follows from Lemma Q] ■ 

IV. Conclusion 

We give a simple, intuitive outer bound for the multitermi- 
nal source coding problem in terms of Hirschfeld-Gebelein- 
Renyi maximal correlation and the rate distortion functions 
for each source. Compared to other standard outer bounds 
which are parameterized by auxiliary random variables, the 
proposed outer bound has the advantage of (i) being effi- 
ciently computable, and (ii) having an explicit expression. 
Roughly speaking, our main result indicates that compressing 
the sources as if they were independent yields near-optimal 
sum-rate performance, provided the sources do not have large 
maximal correlation. 
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